Learning Apache Apex: Real-time streaming applications with Apex by Thomas Weise & Munagala V. Ramanath & David Yan & Kenneth Knowles

Learning Apache Apex: Real-time streaming applications with Apex by Thomas Weise & Munagala V. Ramanath & David Yan & Kenneth Knowles

Author:Thomas Weise & Munagala V. Ramanath & David Yan & Kenneth Knowles [Weise, Thomas]
Language: eng
Format: epub
Tags: COM018000 - COMPUTERS / Data Processing, COM048000 - COMPUTERS / Systems Architecture / Distributed Systems and Computing, COM005000 - COMPUTERS / Enterprise Applications / General
Publisher: Packt Publishing
Published: 2017-11-30T00:00:00+00:00


Performance optimizations

In addition to partitioning, there are other aspects of how an Apex application is deployed on the cluster that are under the control of the application writer which can be used to further improve performance.

For example, consider consecutive operators opA and opB in a DAG. If the former generates a large volume of data on its output port and the latter performs some sort of filtering or aggregation operation so that the volume of data leaving its output port is considerably diminished, it may make sense to co-locate them in the same node to conserve network bandwidth; this is called NODE_LOCAL locality.

Additionally, tuple serialization and deserialization overhead (which can be considerable in some cases) can be eliminated if they could be co-located within the same container; this is called CONTAINER_LOCAL locality. The following figure shows different options of co-location of two operators:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.